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This listmg of daims will replace aU prior versioiis, and listings, of claims in the application: 

Claim 1. (ciureiitly waanicd) A system for [[ simuton e oii s ly] ] annotating a group of subsets of 
geoes with words and phrases that characterize each individual subset of genes and that also 
distinguish FfaaAll said faidividual subset of genes fiom F fthe oth^ 6ub 6 «t »ll other tndividuanv 
considered subsets of genes, in terms of diflferent words and phrases that the system attaches to 
dififerent individual subsets of ^enes, comprising: 

(a) means fiir identifying a set of genes [[t hat g e nerally have no s equence s fmilariti e o am&ag- 

(b) means for partitioning the set of genes in (a) into a gmup of disjoint subsets of genes known 
as clusters; 

(c) means for assodating a set of literature documents with each gene in the set of 

genes in (a)* as a prerequisite means for associating [[ a set of documents with coch -ef-4h& 
ohict«m in (b) ]] eqpb individi^ Sttbsgt 9f g^^^S that t$ k^wp ffl . O>> ^ ft . ctwgtgr . wftfe ^ 
superset of literature documents for each such i ndividual subset of genes^ wherein said 
superset consists of all sets of literature ^ocuni «T^ tW^ thrDugjht the prgy^q iijfftf^ *?!^T^^ ^fl^ 
been associated widi a gene that is a member of said individual subset of yeties : 
(d> means for receivipg the text of part or all of each of the literature docutnents in <c), flg ft nKflni^ 
for conatntcting, for each individual subset of genes known in (b^ as a chiste r, p ^fTTpi< atio|i 
of text consisting ofaU of th^ Received text of all of the literature 4 ^ wftlyjn 
superset of literature documents constnicted in fc> for said indivklud 

(e) means for assignii^ numerical weights to words or phrases contained in [ tfe e 4oirt roedvod ]] 
the compilations of text constructed in (d), said assignmmt being made by {[ poititkii img-Ae 
doGtg nents according to th e ir as s ociation wit h^aehchister a3-^vided-i n <b) and (e) as 
proirktod in (o)i followod by] ] the application, to the words or phrases in those [ [portition g d 
documemts rooriv e d n s provid e d in (d)] ] compilations of text constructed in (d>, of any of the 
word wdgbtHsetttng methods that are ixK4>lemented in the conqsuter program Rainbow [[tiMif 
OT a intended to sknuh ^aeau^ se l e ct wo rds-and ^irases th^ dmraeter B B e noh ohistor and t cr 
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-^fatiHg uiah oaoh ohint e r from the othcr& n . said applicatinn hcm p^ \nff^ ^ to annotate the 
group of snbpets of genes known as clusters in fb). with v «>i^jg ^tyi p hrases that characterize 
each mdtvidual subset of genes known as acl ^iytgr in (h) and that alyp distingyiah sp id 
iodividual subset of yenes from each other mdividually considered subset of genes known as a 

cluster in (hi. in terms of dlflfenen t words and nhrascs that the system art ffl^^^ tn dlflfer^ 

nMMvkhial subsets of genes: 
(I) means for sorting, storing, and displayiog the words and phrases contained in [[dng nmnntft ]] 
each compilation of text that was constructed in fd) and that is associated with [[nch of Ihc 
<rhtet«i s provide d in (b) J ] an individual subset of penes known as a cluster in (b\ said sorting 
being based on fbe numerical weights assigned to the words andphmses as provided in (eX 
and said storage and displ^ allowing words and phrases to be arranged in the order of their 
sorted mimmcal weights; 

whereby the words and phrases having the greatest numerical weights for each [ [riuste]] 
indivpdpyl 5qi^ ^fgfTtn f known as a cluster in fb> provide an iixikiatk)!! of the concerts* 
structures, fimctbns, and processes with whkh said [[etuslef]) individual subset of y^ies k^wn 
as a cluster in (h\ is most particularly associated, and with said [[o)iis(«j] individual subset 
oCg^a is atso distiqgufehed [[ &om the other ohistCTS, ]] from other indi^dduqUy considered 
subsets of yenes known as chisters in (h \ m tetms of different words and phmses that ^ b y 
affflrfiffl to diflferent iodivkiual subsets of genes know n as clustery in (b\, 

Claim 2. (canceled) A s^em for evaluating the quality of gene clustering, conqiristt^: 

(a) means for identifying a set of genes; 

(b) means for partitioiiiiig the set of genes in (a) into subsets known as dusters; 

(c) means for associating a set of documents with each gene in the set of 

genes in (a), and consequently a means for associating a set of documents with 
each of the chisters in (b); 

(d) means for partitk>nii% the set of documents m (c) into two subsets, a 
subset and a testing subset; 

9 
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(e) means for receiving the text of part or all of each of ttie trainiag subset 
docutnents in (d); 

(f) ineans for receiving the text of part or all of each of the testing subset 
documents in (d); 

(g) means for using words or phrases in the text of documents in (e) to train a document 
clasa£er» said trakut^ being accomplished by partitioning the documents acoofding 
to their association with each chister as provided in (b) and (c)^ foUowed fay the 
param^er-fitting, usipg the words or phrases in those partitioned documents^ of any of 
the document classifiers that are inq>lemcntcd in the con^uter program Rainbow; 

(h) means for using words or phrases in the text of each docimiem in (f) to lest the traiDed 
document dassifin^ in (g), wAierein the classifier predicts the cluster with vvdiich the 
test document is associated; 

(i) means for the option of calculating and storing the fractions of test documents in (d) 
known to cone^nd to each chister as provided in (b) and (c), that are correctly 
predicted to be associated with each cluster, upon testing with the document 
classifier as provided in (h); 

(j) means for the option of repeatedly and randomly partitiotiing docxmients in (c) into 
traimi^ and test subsets as provided in (d), for using each such partitioning to 
calculate a fiaction of correct classifications for each cluster as provided in (e)-(i), 
and for storing said fractions for each and every such random partitioning of 
documents into training and test subsets, 
(k) means for the option ofrepeatedly and randomly partitioning the 

into subsets^ wherein the sizes of the random subsets are matched to the sizes 
of the chisters as provided in (b); for re-associatiog a set of documents with each 
gene in the set as in (c)^ and consequently associating a set of documents with 
each of the random^ partitioned subsets of genes; for making available means 
(dHQ so as to be aUe to calculate a fraction of correct classifications for each 
of the random partitk>ns that are matched to the clusters as provided in (b); and 
for storing said fractions for each and every such random partitioning of the set 
of genes in (a). 

10 
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(1) means for the option of calcukting a measure of central tendency, such as mean or 
median, far the fractions that were generated fay rq)eated, random partitionipg of 
docun»it$ in 0)» and for the fractions that were generated by repeated* rat)dom 
partitk>mng of the set of genes in (k); at)d for cakulatii^ a figureK>f-inerit fof 
eath cluster as the imnierical difier^K^e between said measure of central tendency 
obtained fiom (0 and (k); 

whereby said figuro-of-n^iit for each cluster provides an indication of the extern to 
which some words and fltaases, present in documents associated with genes in smd 
cluster, coUectivety disttpguish that cluster from all the other chisters, and 
wber^ said figure^fHment for each cluster provides an indication of the extent 
to wluch the annotations produced by the system of Claim 1 distinguish the clusters, 
and whereby said %ur&^f-merit for each cluster provides an indication of the quality 
oftbatchister. 

Claim 3. (canceled) A system for evahiating the quality of gene chistermg, conqniraig; 

(a) means for identifyittg a set of genes; 

(b) means for partitionipg the set of genes in (a) into subsets known as clusters; 

(c) means for associating a set of documents with each gene in the set of 
genes m (a); 

(d) means for c^ilrailating for every pair of genes within a chister in (b) a coupling 
strmgth iddex» sakl index being |HX>portional to the number of times that any 
document in (c) is associated with both members of said pair of genes; and for 
storing sakl set of index vahies for every chister, 

(e) means for repeated^ and randomly partitioning the set of genes in (a) into subsets* 
wfaerem the sizes of the random subsets are matched to the sizes of the chisters as 
provided in (b); for re-associating a set of documents with each gene in the set as in 
(c); for making availaUe means (d) so as to be able to cakulate coupling strength 
indices; and for storing said set of index values for every such random subset; 

(Q means for cakntating for every cluster in (b) and random subset in (e) a measure of 

n 
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central tendency* such as the mean or median, for the set of coupling strength index 
values tbat are stored as provided in (d) and (e); 
(g) means for calculating and displaying for eveiy cluster in (b) the percentage of times 
that the cetmral t^id^icy calculated in (f) is larger than the corresponding central 
tendency in (Q» among those calculated tepeatedfy for corresponding random subsets 
in(c>- 

wfaereby said percentage for each cluster provides an indication of the extent to 
which documents associated with genes in said cluster collective^ distittguisb that 
chister fiom all the other clusters, 

and whereby said percent^e for each cluster provides an indication of the quality 
of that duster. 

Claim 4* (canceled) A method* in a computer system, of clustering gene expression data» 
wherein data received for each one of a plurality of genes constitute the 

response of said geoe to an interv^ition at an initial time; and 
M^KTCin data received for each one of a plurality of said g^nes were collected 

at a series of time points following the intervention; and 
A^lierdn data recrived for a plurality of said genes are proportional to the amount 

of messenger RNA, x» for each one of said genes; and 
wherein the rate of change of x, dx/dt, is represented as a synthesis rate^ t, 

mums the product of a degradation rate k and x» namely, -kx; and 
wherein the f and k ate both represented as being piecewise continuous^ 

time-varying iunctx>ns at each of the measurement time points for each 

one of sakl phirality of genes; and 
^^iiierein f and k are ^iproximated as truncated Taylor series, the coefiGciaxts of v^iiidi 

are estimated usi^g the received data x, at each of the measurement time points 

for each cme of sakl plurality of genes; and 
wherem the estimated values of the synthesis rate, £ at each of the tneasuiement 

titoe points for each one of said phirality of genes are used to cluster said plurality 

12 
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of genes; 

vAxerdby said chisternng imy reveal subsets of genes among tbe phimlhy of genes 
that are regulated by the same transcription fictors, as evidenced by the similarity 
of tbrar time^varyi^g transcr^ion rates. 

Claim S. (catK^led) The meAod of claim 4, used as tbe means for partitioning a set 
of genes iitto clusters in a syirtem fi>r annotatiQg sets and subsets of genes. 

Claim 6.(car]celed) Tbe method of claim 4, used as the means for partxtiomng a set 
of genes into dieters in a syst^ for evahiating the quality of gene chisteri^g. 
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